Swarm multi-agent reinforcement learning-based pipeline for workload placement

ABSTRACT

Multi-agent reinforcement learning-based workload placement is disclosed. A placement engine is configured to use the state of a system and actual rewards to generate expected rewards that correspond to actions. Agents can take actions for corresponding workloads based on the expected rewards output by the placement engine. This allows workloads to be placed in a manner that conservers power relative to load placement policies while helping avoid service level agreement violations.

FIELD OF THE INVENTION

Embodiments of the present invention generally relate to placingworkloads in a computing environment. More particularly, at least someembodiments of the invention relate to systems, hardware, software,computer-readable media, and methods for using infrastructureefficiently to execute workloads while respecting service levelagreements (SLAs) and ensuring quality of service (QoS).

BACKGROUND

Cloud computing has several advantages, which include pay-per-usecomputation from the customer's perspective and resource sharing fromthe provider's perspective. Using virtualization, it is possible toabstract a pool of computing devices to offer computing resources tousers (e.g., consumers or customers) that are tailored to the needs ofthe users. Using various abstractions such as containers and virtualmachines, it is possible to offer computation services without the userknowing what infrastructure is executing the user's code. These servicesmay include Platform as a Service (PaaS) and Function as a Service(FaaS) paradigms.

In these paradigms, the QoS expected by the user may be expressedthrough SLAs. SLAs often reflect expectations such as response time,execution time, uptime percentage, and/or other metrics. Providers tryto ensure that they comply with the SLAs in order to avoid contractualfines and to preserve their reputation as an infrastructure provider.

Providers are faced with the problem of ensuring that they comply withthe contractual agreements (e.g., SLAs) to which they have agreed.Providers may take different approaches to ensure they comply with theircontractual agreements. In one example, a provider may dedicate a staticamount of resources to each user. This presents a couple of problems.First, it is problematic to assume that an application is bounded by oneparticular resource. Some applications may have an IO (Input/Output)intensive phase followed by a compute-intensive phase. Dedicating someamount of static resources to each user may result in inefficiencies andidle resources. Further, it is possible that the initial allocation ofresources may be under-estimated or over-estimated.

Allocating excessive resources may also adversely impact the provider.From the perspective of a single workload, the provider may perform theworkload and easily comply with the relevant SLAs. However, the numberof users that can be served by the provider is effectively reducedbecause the amount of spare resources dictates how many workloads can beperformed in parallel while still respecting the SLAs. As a result,allocating excessive resources to a single workload impacts the overallefficiency and may limit the number of workloads the provider canaccommodate.

While SLAs are often determined in advance of performing a workload, theexecution environment is more dynamic. New workloads may compete forcomputing resources, and this may lead to unplanned demand, which maydisrupt the original workload planning because of a greater need toshare resources, workload priorities, and overhead associated withcontext switching.

The challenge facing providers is to provide services to their users ina manner that respects SLAs while minimizing resource usage. Stateddifferently, providers are faced with the challenge of efficiently usingtheir resources to maximize the number of users.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which at least some of the advantagesand features of the invention may be obtained, a more particulardescription of embodiments of the invention will be rendered byreference to specific embodiments thereof which are illustrated in theappended drawings. Understanding that these drawings depict only typicalembodiments of the invention and are not therefore to be considered tobe limiting of its scope, embodiments of the invention will be describedand explained with additional specificity and detail through the use ofthe accompanying drawings, in which:

FIG. 1 discloses aspects of a system configured to place workloads in asystem that includes computing resources;

FIG. 2 discloses aspects of multi-agent reinforcement learning forworkload placement in an environment;

FIG. 3 discloses aspects of a reward function;

FIG. 4 discloses additional aspects of multi-agent reinforcementlearning with a single placement engine;

FIG. 5 discloses aspects of an action space;

FIG. 6 discloses aspects of placing workloads;

FIG. 7 discloses aspects of placing workloads that have been trainedusing rewards; and

FIG. 8 discloses aspects of a computing device, system, or entity.

DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS

Embodiments of the present invention generally relate to workloadplacement and resource allocation. More particularly, at least someembodiments of the invention relate to systems, hardware, software,computer-readable media, and methods for allocating resources usingmulti-agent reinforcement learning-based systems or pipelines. Exampleembodiments of the invention further relate to executing workloads whilerespecting service level agreements (SLAs) and ensuring quality ofservice (QoS).

SLAs are typically set or determined before a workload is executed.However, the execution of the workload is subject to various issues thatmay impact the ability of the provider to meet the requirements of theSLAs. Examples of such issues include inadequate knowledge about actualresource requirements, unexpected demand peaks, hardware malfunctions,or the like.

Workloads often have different bottlenecks. Some workloads may becompute-intensive while other workloads may be IO (Input/Output)intensive. Some workloads may have different resource requirements atdifferent points of their executions. As a result, some workloads can beexecuted more efficiently in certain resource environments and it may bebeneficial, at times, to migrate a workload to a new environment. Forexample, the execution environment during a compute-intensive phase ofthe workload may be inadequate for an IO intensive phase of the sameworkload.

Embodiments of the invention relate to allocating resources as requiredor, more specifically, to placing and/or migrating workloads in order tocomply with SLAs and to efficiently use resources. In one example,allocating resources is achieved by placing workloads in specificresources, which may include migrating the workloads from one locationto another workload. Workloads are placed or migrated such that SLAs arerespected, to cure SLA violations, and/or such that the resources of theprovider are used beneficially from the provider's perspective. Oneadvantage of embodiments of the invention is to allow a provider tomaximize use of their resources.

Embodiments of the invention are disclosed with reference to resourceallocation. Embodiments of the invention may accommodate infrastructurethat may or may not allow changes in the number or amount of resources(i.e., the number of cores, amount of memory, number of GPU (GraphicProcessing Unit) devices, or the like). In the context of resourceallocation, embodiments of the invention relate to resource allocation,which may include workload placement and/or workload migration.

FIG. 1 discloses aspects of reinforcement learning based workloadplacement in a computing environment. A system 100 (e.g., a datacenteror other computing environment) may include resources 122. The resources122 may include nodes, which are represented by the nodes 110, 112, and114. Nodes may each include a computing device or system withprocessors, memory, network hardware, and the like. The nodes 110, 112,and 114 may include physical machines, virtual machines, containers, andthe like. Workloads are typically assigned to computers, containers, orvirtual machines operating on the nodes 110, 112, and 114.

The resources 122 of the system 100 may be used to perform jobs orworkloads. In other words, the system 100 allocates the resources 122 toperform the workloads. Allocating a resource may include placing aworkload at a node (e.g., at a virtual machine) and/or migrating aworkload from one node to another node or from one virtual machine toanother virtual machine.

The following discussion assumes that workloads are performed by virtualmachines and that each of the nodes 110, 112, and 114 may support one ormore virtual machines. Further, each virtual machine may perform orexecute one or more workloads.

Embodiments of the invention ensure that the workloads are placed withvirtual machines in a manner that improves the usage of the resources122 while complying with relevant SLAs.

The system 100 may include or have access to a workload queue 102 thatstores workloads, represented by the workloads 104 and 106. When a usersubmits a workload, the workload may be stored in the workload queue 102and then placed in the resources 122.

The system 100 may also include a placement engine 108, which may alsooperate on a node or server. The placement engine 108 may include amachine learning model, neural network, reinforcement learning model, orthe like. In one embodiment, the placement engine 108 may include areinforcement-based model configured to generate placementrecommendations or actions for workloads executing in the resources 122.The placement recommendations may have different forms. For example, theplacement recommendations may be in the form of a reward if a certainaction is performed. The action associated with the highest rewardoutput by the placement engine 108 is typically executed.

FIG. 1 illustrates that a workload 116 has been placed at the node 112and that workloads 118 and 120 have been placed at the node 114. Morespecifically, the workload 116 may be performed by a virtual machineinstantiated on the node 112 and the workloads 118 and 120 may beperformed by one or more virtual machines instantiated on the node 114.These workloads 116, 118, and 120 were placed, in one example, by anagent based on recommendations of the placement engine 108.

The placement engine 108 may evaluate the state of the resources 122 aswell as an actual reward associated with the execution of the workloads116, 118, and 120 to generate new placement recommendations. This mayresult in the migration of one or more of the workloads 116, 118, and120 to different virtual machines or to a different portion of theresources 122.

The placement engine 108, once trained, thus makes placement decisionsor placement recommendations. Placement decisions or recommendations mayinclude placing a new workload at a node or a virtual machine, moving ormigrating a workload from a current node or virtual machine to adifferent node or virtual machine, and keeping a workload at the samenode or virtual machine.

Each of the workloads is associated with an agent in on example. Anagent, by way of example, may be a component or engine that operates ina computing environment to perform actions, communication, or the like.An agent may thus generate goals, perform actions, sense theenvironment, determine status of a computing system, learn, or the like.FIG. 1 illustrates an agent 130 associated with a workload 118. In oneembodiment, each of the workloads 116, 118, 120 executing in theresources 122 is associated with a different agent. At the same time,all of the agents in the system 100 use the same placement engine 108.This allows swarm behavior for the agents where each of the agents isassociated with a different workload while using the same placementengine 108.

FIG. 2 discloses aspects of placing workloads using a reinforcementlearning model. In FIG. 2 , an agent 202 may be associated with aworkload 220 being executed by a virtual machine 218, which may be partof the resources 122. The agent 202 may perform an action 206 (e.g., aplacement action) with regard to the workload 220. The action 206 mayinclude leaving the workload 220 at the virtual machine 218 ormoving/migrating the workload 220 to the virtual machine 222.

The action 206 is thus executed in the environment 204, which includesthe resources 122 or more specifically the virtual machines 218 and 222.After execution or during execution of the workload 220, the state 210and/or a reward 208 may be determined and returned to the agent 202and/or to the placement engine 212.

In one example embodiment, the reward 208 may be a value that representsthe execution of the workload 220 relative to an SLA or an SLA metric.For example, the reward may represent a relationship between theresponse time (rt) of the workload and the response time specified inthe SLA.

An example reward function may be defined as follows:

${{f\left( {\Delta,\sigma_{L},\sigma_{R}} \right)} = {{\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}}}{if}\Delta} > 1}},{{otherwise}{\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}} - 1}.}}$

In one example embodiment, Δ is a difference between the SLA and theresponse time. In one example, σ_(L),σ_(R) define, respectively, howfast the left and right portion of the curve decays.

FIG. 3 discloses aspects of a reward curve. More specifically, the curve300 is an example of a reward curve where σ_(L)=2 and σ_(R)=0.75. Inthis example, there are several possibilities for the value of Δ:

-   -   1) Δ>0→SLA>rt: In this case, there is a positive gap between the        SLA and the value of the response time (rt). This indicates that        resources are being wasted and should provide a positive reward.    -   2) Δ<0→SLA<rt: In this case, there is a negative gap between the        SLA and the response time. This indicates an SLA violation and        should provide a negative reward.    -   3) Δ=0→SLA=rt: In this case, there is no gap between the SLA and        the response time. This indicates that the workload fits        perfectly with the infrastructure and a maximum reward should be        provided.

In one example embodiment, the state 210 may include or represent one ormore of resource usage per virtual machine, resource usage per workload,state of each workload, time to completion for each workload or the likeor combinations thereof. The state 210 may be formed as a type or styleof one hot encoding that allows the state of all resources (e.g., allvirtual machines/nodes in the resources 122) to be included in the onehot encoding style. In one example, the one-hot encoding is a one hotencoding style that includes floating point values to represent thestate 210. The state 210 may also represent resources (e.g., idlevirtual machines) that are not being used. The state 210 allows allagents to have insight and understanding into the infrastructure andstate of all resources. In other words, each of the agents can see theenvironment of the workload as well as the other workload environmentsin the system. As previously stated, however, all of the agents share oruse the same placement engine 212 in one example.

Returning to FIG. 2 , the state 210 of the environment 204 and thereward 208 are input into the placement engine 212 (directly or by theagent 202 as illustrated in FIG. 2 ). The state 210 may include thestate of each node or virtual machine included in the resources 122.This information can be represented in a one hot encoding style. In thisexample, the reward 208 reflects the actual performance (rt) at thevirtual machine of the workload 220. If using response time (rt) as ametric, the reward reflects the relationship between the actual responsetime and the SLA response time.

The placement engine 212 may generate a new recommended action for theagent 202 to perform for the workload 220. This allows the agent 202 tocontinually adapt to changes (e.g., SLA compliance/non-compliance) atthe resources 122 and perform placement actions that are best for theworkload 220 and/or for the provider to comply with SLA requirements,efficiently use the resources 122, or the like.

The placement engine 212 may also have a policy 216 that may impact theplacement recommendations. The policy, for example, may be to placeworkloads using a minimum number of virtual machines, to perform loadbalancing across all virtual machines, or to place workloads usingreinforced learning. These policies can be modified or combined. Forexample, the policy may be to place workloads using reinforced learningwith some emphasis towards using a minimum number of virtual machines orwith emphasis toward load balancing. Other policies may be implemented.

The output of the placement engine 212, may depend on how many actionsare available or defined. If a first action is to keep the workloadwhere the workload is currently operating and the second action is tomove the workload to a different node, the output of the placementengine may include two anticipated rewards. One of the rewardscorresponds to performing the first action and the other rewardcorresponds to performing the second action. The action selected by theagent 202 will likely be the action that is expected to give the highestreward. As illustrated in FIG. 2 , multiple agents 218, 220, and 202 areusing the same placement engine 212. In one example, because theresources 122 may include multiple virtual machines, an expected rewardmay be generated for migrating the workload to one of those virtualmachines.

The placement engine 212, prior to use, may be trained. When trainingthe placement engine 212, workloads may be moved randomly within theresources. At each node, a reward as generated. These rewards, alongwith the state, can be used to train the placement engine 212. Overtime, the placement becomes less random and begins to rely on the outputof the placement engine 212 until training is complete.

Thus, the placement engine 212, by receiving the reward 208 and state210 during training, which may include multiple migrations some of whichmay be random, can implement reinforcement learning. Conventionally,reinforcement learning may rely on a Q-table. Embodiments of theinvention, however, may provide deep learning reinforcement learning.More specifically, the placement engine 212 may include a neural networkthat allows the experiences of the agents, along with random migrations,to train the placement engine 212. One advantage of the placement engine212 is that the placement actions or recommendations can be mapped to amuch larger number of states compared to a conventional Q-table, whichis essentially a lookup table, and that the mapping can be learned.

FIG. 4 discloses aspects of a system configured to dynamically placeworkloads in the context of multiple agents. FIG. 4 illustrates agents402 that perform actions relative to corresponding workloads placed inthe environment 404 (e.g., the resources 122). In this example, eachdevice (e.g., each virtual machine) has a state. In this example, thestate 410 includes the state of each virtual machine in the environment404. The rewards 408 correspond to the rewards determined at each of thedevices or virtual machines. The placement engine 412, when makingrecommendations, may consider all rewards collectively or may considerthe rewards associated with a specific agent when generating a placementrecommendation.

As agents operate to place workloads, the rewards 408 can be shared ornot shared. When the rewards 408 are shared, the agents 402 arecollaborative. When the rewards 408 are not shared, the agents 402 arecompetitive. In both cases, the agents rely on the same placement engine412. The inputs to the placement engine 412 include the state 410 andthe rewards 408. The actions are defined by migrations between devicesor virtual machines in the environment 404. For example, competitiveagents may operate to run the workloads as fast as they can whilecollaborative agents may work to conserving resources.

FIG. 5 discloses aspects of an action space. In one example, the actionspace is defined by integer values that represent the index of thevirtual machines where the workloads can be placed. The virtual machinesmay have the same or different resources. For example, if a workload isplaced at virtual machine V_(i) 502 in an environment with n virtualmachines, the actions are to move or migrate (action 508) the workloadto the virtual machine V_(j) 504, where j≠i and 0<j≤n, or to move theworkload to the virtual machine V_(i)(action 506). The action 506 is ineffect, to keep the workload at the same virtual machine. Initially, allof the workloads are in a non-initialized state.

FIG. 6 illustrates aspects of placing a workload. In FIG. 6 , aplacement engine 606 may receive inputs including rewards 602 and astate 604. The rewards 602, for a particular agent, may include theactual rewards relative to the current execution of the workload of theagent at a specific virtual machine in the resources. The state 604,however, may include the states of all virtual machines in theenvironment. In another example, the rewards 602 may include rewardsassociated with all of the agents.

The placement engine 606 may output an expected reward 608 and anexpected reward 610. The expected reward 608 may be the reward expectedfor performing a specific action, such as keeping the workload at thesame node or virtual machine (e.g., action 506). The expected reward 610may be the reward expected for performing a different action, such asmigrating the workload to a different node or virtual machine (e.g.,action 508).

As the number of migration destinations increases, the output of theplacement engine 606 may include a reward for each virtual machine inthe environment. This allows the agent 612 to select a specific rewardand perform the associated action: keep workload at current virtualmachine or migrate the workload to a new virtual machine.

FIG. 7 discloses aspects of placing workloads in an environment. Themethod 700 includes elements that may be performed once or lessfrequently than other elements. For example, the placement engine may betrained once, periodically, or continually.

The method 700 may begin by training 702 a placement engine. Next, theplacement engine operates and may receive input from or associated withone or more agents. The input may include a state of the environmentand/or an actual reward of the workload at the current virtual machine.The placement engine then generates 706 an output, which may include ananticipated or expected reward for each of multiple actions. Forexample, the output may include an expected reward for each of thevirtual machines in the environment. The agent may perform 708 theaction with the highest expected reward. The method 700 or portionsthereof may then repeat for all of the agents. This allows agents toperform actions continually in a manner to use the available resourcesin a manner that satisfies the relevant SLAs and effectively uses theresources.

Embodiments of the invention thus provide multi-agent reinforcementlearning using a single model. The framework or system may rely of a setof reinforcement agents (e.g., one per workload) that can be extended asresources are added. For example, embodiments of the invention can adaptto adding/removing virtual machines. Embodiments of the invention can beimplemented as a FaaS, where all of the input can be obtained byscanning the infrastructure performing the functions. A workload issubmitted and placed and/or migrated during execution.

Once the placement engine is trained, placement can be performed on thefly without additional training (although additional training is notprecluded). Embodiments of the invention may place workloads that havenot been seen previously. Workloads are placed at the virtual machinethat provides the higher reward.

The following experiments are illustrated in the appendix, which isattached.

In one example, 8 virtual machines were available for 5 workloads. Whenthe policy was to perform load balancing (no reinforced learning), the 5workloads were distributed across 5 of the 8 virtual machines. When thepolicy was to perform minimum virtual machine placement, the workloadswere all placed on the same virtual machine. This led to a minor SLAviolation. When the policy was reinforcement learning, the 5 workloadswere distributed to 3 of the virtual machines. This allows some powersavings to be achieved (by not powering on 5 of the virtual machines asoccurred during load balancing) while preventing SLA violations.

In another example, 8 virtual machines were available for 25 workloads.When the policy was to perform load balancing, the workloads weredistributed across all 8 virtual machines. However, substantialplacement changes occurred. When the policy was to perform minimumplacement, the workloads were placed on the same virtual machine, whichled to an SLA violation. When the policy was multi-agent reinforcementlearning, the error was smaller than when performing minimum virtualmachine placement. However, the SLA violation was prevented by adjustingthe reward function.

Embodiments of the invention may help avoid changing the initialallocation or placement of a workload and thereby avoid the cost ofallocation. Further, power can be conserved by reducing the activenumber of virtual machines, compared to a load balanced placementpolicy. Further, adjustments to the reward function can help prevent SLAviolations during execution.

In another experiment, 20 workloads were placed on 16 virtual machines.When the policy was load balance placement, the load was distributedacross all 16 virtual machines. However, many load placement changesoccurred.

The following is a discussion of aspects of example operatingenvironments for various embodiments of the invention. This discussionis not intended to limit the scope of the invention, or theapplicability of the embodiments, in any way. When the policy wasminimum virtual machine placement, some of the virtual machines werenever used. When the policy was reinforced learning in accordance withembodiments of the invention, resources were saved (e.g., some virtualmachines were not used or had less use) while the load was distributedbetween the virtual machines that were used.

In another experiment, 75 workloads were placed on 16 virtual machines.When the policy was load balance placement, the load was distributedacross all 16 virtual machines. However, there were a lot of placementchanges, which may impact the SLA requirements depending on thedeployment time. When the policy was minimum virtual machine placement,the workloads could all be placed on the same virtual machine. Byreducing the number of powered on virtual machines, the workloads wereplaced on the same virtual machine. However, this led to an SLAviolation. When the policy was reinforced learning, power and/orresources were conserved while, at the same time, the workloads weredistributed across a portion of the virtual machine. Thus, thereinforced learning placement approach demonstrated aspects of both loadbalancing and minimum virtual machine placement policies.

Embodiments of the invention advantageously reduced the number ofchanges of the initial allocation, since it would increase the responsetime when considering the time needed to deploy the workloads.Embodiments of the invention further conserved power by reducing thenumber of active virtual machines compared to the load balance placementpolicy. Further, embodiments of the invention avoided SLA violations inthis example.

In general, embodiments of the invention may be implemented inconnection with systems, software, and components, that individuallyand/or collectively implement, and/or cause the implementation of,placement operations including reward determination operations,reinforcement learning operations, workload migration operations, or thelike. More generally, the scope of the invention embraces any operatingenvironment in which the disclosed concepts may be useful.

At least some embodiments of the invention provide for theimplementation of the disclosed functionality in existing backupplatforms, examples of which include the Dell-EMC NetWorker and Avamarplatforms and associated backup software, and storage environments suchas the Dell-EMC DataDomain storage environment. In general, however, thescope of the invention is not limited to any particular data backupplatform or data storage environment. The workloads may include, forexample, backup operations, deduplication operations, segmentingoperations, fingerprint operations, or the like or combinations thereof.

New and/or modified data collected and/or generated in connection withsome embodiments, may be stored in a data protection environment thatmay take the form of a public or private cloud storage environment, anon-premises storage environment, and hybrid storage environments thatinclude public and private elements. Any of these example storageenvironments, may be partly, or completely, virtualized. The storageenvironment may comprise, or consist of, a datacenter which is operableto service read, write, delete, backup, restore, and/or cloning,operations initiated by one or more clients or other elements of theoperating environment. Where a backup comprises groups of data withdifferent respective characteristics, that data may be allocated, andstored, to different respective targets in the storage environment,where the targets each correspond to a data group having one or moreparticular characteristics.

Example cloud computing environments, which may or may not be public,include storage environments that may provide data protectionfunctionality for one or more clients. Another example of a cloudcomputing environment is one in which processing, data protection, andother, services may be performed on behalf of one or more clients. Someexample cloud computing environments in connection with whichembodiments of the invention may be employed include, but are notlimited to, Microsoft Azure, Amazon AWS, Dell EMC Cloud StorageServices, and Google Cloud. More generally however, the scope of theinvention is not limited to employment of any particular type orimplementation of cloud computing environment.

In addition to the cloud environment, the operating environment may alsoinclude one or more clients that are capable of collecting, modifying,and creating, data. As such, a particular client may employ, orotherwise be associated with, one or more instances of each of one ormore applications that perform such operations with respect to data.Such clients may comprise physical machines, containers, or virtualmachines (VMs).

It is noted that any of the disclosed processes, operations, methods,and/or any portion of any of these, may be performed in response to, asa result of, and/or, based upon, the performance of any precedingprocess(es), methods, and/or, operations. Correspondingly, performanceof one or more processes, for example, may be a predicate or trigger tosubsequent performance of one or more additional processes, operations,and/or methods. Thus, for example, the various processes that may makeup a method may be linked together or otherwise associated with eachother by way of relations such as the examples just noted. Finally, andwhile it is not required, the individual processes that make up thevarious example methods disclosed herein are, in some embodiments,performed in the specific sequence recited in those examples. In otherembodiments, the individual processes that make up a disclosed methodmay be performed in a sequence other than the specific sequence recited.

Following are some further example embodiments of the invention. Theseare presented only by way of example and are not intended to limit thescope of the invention in any way.

Embodiment 1. A method comprising: receiving input into a placementengine, the input including an actual reward of a workload operating inan environment included in resources and a state, generating expectedrewards including a first expected reward and a second expected reward,performing the first action on the workload, by an agent associated withthe workload, when the first expected reward is higher than the secondexpected reward, and performing the second action on the workload whenthe second expected reward is higher than the first expected reward.

Embodiment 2. The method of embodiment 1, wherein the environmentcomprises a current virtual machine and wherein the actual rewardcorresponds to a service level agreement metric of the workloadoperating at the current virtual machine.

Embodiment 3. The method of embodiment 1 and/or 2, wherein the stateincudes a one hot encoding style of all environments, each of theenvironments including a virtual machine.

Embodiment 4. The method of embodiment 1, 2, and/or 3, wherein the onehot encoding style includes a resource usage per virtual machine, aresource usage per workload, a state of each workload, and a time tocompletion for each workload using floating point values.

Embodiment 5. The method of embodiment 1, 2, 3, and/or 4, wherein thefirst action is to keep the workload at the current virtual machine andwherein the second action is to migrate the workload to a differentvirtual machine.

Embodiment 6. The method of embodiment 1, 2, 3, 4, and/or 5, wherein theplacement engine comprises a neural network configured to map the inputto expected rewards.

Embodiment 7. The method of embodiment 1, 2, 3, 4, 5, and/or 6, whereinthe placement engine outputs an expected reward for performing an actionrelative to each of the virtual machine in the resources.

Embodiment 8. The method of embodiment 1, 2, 3, 4, 5, 6, and/or 7,further comprising adjusting a reward function when an SLA violation isdetected.

Embodiment 9. The method of embodiment 1, 2, 3, 4, 5, 6, 7, and/or 8,wherein the reward function is:

${{f\left( {\Delta,\sigma_{L},\sigma_{R}} \right)} = {{\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}}}{if}\Delta} > 1}},{{otherwise}\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}} - 1}},$

wherein Δ is a difference between an SLA response time metric and anactual response time for the workload in the environment, wherein σ_(L)and σ_(R) define, respectively, how fast a left and a right portion ofthe reward function decay.

Embodiment 10. The method of embodiment 1, 2, 3, 4, 5, 6, 7, 8, and/or9, wherein the placement engine is trained by randomly migratingworkloads amongst virtual machines in the resources.

Embodiment 11. The method of embodiment 1, 2, 3, 4, 5, 6, 7, 8, 9,and/or 10, wherein the placement engine is configured to place theworkload in a manner that includes both minimum virtual machineplacement and load balancing.

Embodiment 12. A method for performing any of the operations, methods,or processes, or any portion of any of these, or any combination thereofdisclosed herein.

Embodiment 13. A non-transitory storage medium having stored thereininstructions that are executable by one or more hardware processors toperform operations comprising the operations of any one or more ofembodiments 1-12.

The embodiments disclosed herein may include the use of a specialpurpose or general-purpose computer including various computer hardwareor software modules, as discussed in greater detail below. A computermay include a processor and computer storage media carrying instructionsthat, when executed by the processor and/or caused to be executed by theprocessor, perform any one or more of the methods disclosed herein, orany part(s) of any method disclosed.

As indicated above, embodiments within the scope of the presentinvention also include computer storage media, which are physical mediafor carrying or having computer-executable instructions or datastructures stored thereon. Such computer storage media may be anyavailable physical media that may be accessed by a general purpose orspecial purpose computer.

By way of example, and not limitation, such computer storage media maycomprise hardware storage such as solid state disk/device (SSD), RAM,ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or otheroptical disk storage, magnetic disk storage or other magnetic storagedevices, or any other hardware storage devices which may be used tostore program code in the form of computer-executable instructions ordata structures, which may be accessed and executed by a general-purposeor special-purpose computer system to implement the disclosedfunctionality of the invention. Combinations of the above should also beincluded within the scope of computer storage media. Such media are alsoexamples of non-transitory storage media, and non-transitory storagemedia also embraces cloud-based storage systems and structures, althoughthe scope of the invention is not limited to these examples ofnon-transitory storage media.

Computer-executable instructions comprise, for example, instructions anddata which, when executed, cause a general-purpose computer, specialpurpose computer, or special purpose processing device to perform acertain function or group of functions. As such, some embodiments of theinvention may be downloadable to one or more systems or devices, forexample, from a website, mesh topology, or other source. As well, thescope of the invention embraces any hardware system or device thatcomprises an instance of an application that comprises the disclosedexecutable instructions.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts disclosed herein are disclosed asexample forms of implementing the claims.

As used herein, the term ‘module’, ‘component’, ‘engine’, or ‘agent’ mayrefer to software objects or routines that execute on the computingsystem. The different components, modules, engines, and servicesdescribed herein may be implemented as objects or processes that executeon the computing system, for example, as separate threads. While thesystem and methods described herein may be implemented in software,implementations in hardware or a combination of software and hardwareare also possible and contemplated. In the present disclosure, a‘computing entity’ may be any computing system as previously definedherein, or any module or combination of modules running on a computingsystem.

In at least some instances, a hardware processor is provided that isoperable to carry out executable instructions for performing a method orprocess, such as the methods and processes disclosed herein. Thehardware processor may or may not comprise an element of other hardware,such as the computing devices and systems disclosed herein.

In terms of computing environments, embodiments of the invention may beperformed in client-server environments, whether network or localenvironments, or in any other suitable environment. Suitable operatingenvironments for at least some embodiments of the invention includecloud computing environments where one or more of a client, server, orother machine may reside and operate in a cloud environment.

With reference briefly now to FIG. 8 , any one or more of the entitiesdisclosed, or implied, by the Figures and/or elsewhere herein, may takethe form of, or include, or be implemented on, or hosted by, a physicalcomputing device, a container, and/or a virtual machine, one example ofwhich is denoted at 600. As well, where any of the aforementionedelements comprise or consist of a virtual machine (VM) or a container,that VM or container may constitute a virtualization of any combinationof the physical components disclosed in FIG. 8 .

In the example of FIG. 8 , the physical computing device 800 includes amemory 802 which may include one, some, or all, of random access memory(RAM), non-volatile memory (NVM) 804 such as NVRAM for example,read-only memory (ROM), and persistent memory, one or more hardwareprocessors 806, non-transitory storage media 808, UI device 810, anddata storage 812. One or more of the memory components 802 of thephysical computing device 800 may take the form of solid-state device(SSD) storage. As well, one or more applications 814 may be providedthat comprise instructions executable by one or more hardware processors806 to perform any of the operations, or portions thereof, disclosedherein.

Such executable instructions may take various forms including, forexample, instructions executable to perform any method or portionthereof disclosed herein, and/or executable by/at any of a storage site,whether on-premises at an enterprise, or a cloud computing site, client,datacenter, data protection site including a cloud storage site, orbackup server, to perform any of the functions disclosed herein. Aswell, such instructions may be executable to perform any of the otheroperations and methods, and any portions thereof, disclosed herein.

The present invention may be embodied in other specific forms withoutdeparting from its spirit or essential characteristics. The describedembodiments are to be considered in all respects only as illustrativeand not restrictive. The scope of the invention is, therefore, indicatedby the appended claims rather than by the foregoing description. Allchanges which come within the meaning and range of equivalency of theclaims are to be embraced within their scope.

What is claimed is:
 1. A method comprising: receiving input into aplacement engine, the input including an actual reward of a workloadoperating in an environment included in resources and a state;generating expected rewards including a first expected reward and asecond expected reward; performing the first action on the workload, byan agent associated with the workload, when the first expected reward ishigher than the second expected reward; and performing the second actionon the workload when the second expected reward is higher than the firstexpected reward.
 2. The method of claim 1, wherein the environmentcomprises a current virtual machine and wherein the actual rewardcorresponds to a service level agreement metric of the workloadoperating at the current virtual machine.
 3. The method of claim 2,wherein the state incudes a one hot encoding style of all environments,each of the environments including a virtual machine.
 4. The method ofclaim 3, wherein the one hot encoding style includes a resource usageper virtual machine, a resource usage per workload, a state of eachworkload, and a time to completion for each workload, using floatingpoint values.
 5. The method of claim 2, wherein the first action is tokeep the workload at the current virtual machine and wherein the secondaction is to migrate the workload to a different virtual machine.
 6. Themethod of claim 1, wherein the placement engine comprises a neuralnetwork configured to map the input to expected rewards.
 7. The methodof claim 1, wherein the placement engine outputs an expected reward forperforming an action relative to each of the virtual machine in theresources.
 8. The method of claim 1, further comprising adjusting areward function when an SLA violation is detected.
 9. The method ofclaim 8, wherein the reward function is:${{f\left( {\Delta,\sigma_{L},\sigma_{R}} \right)} = {{\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}}}{if}\Delta} > 1}},{{otherwise}\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}} - 1}},$wherein Δ is a difference between an SLA response time metric and anactual response time for the workload in the environment, wherein σ_(L)and σ_(R) define, respectively, how fast a left and a right portion ofthe reward function decay.
 10. The method of claim 1, wherein theplacement engine is trained by randomly migrating workloads amongstvirtual machines in the resources.
 11. The method of claim 1, whereinthe placement engine is configured to place the workload in a mannerthat includes both minimum virtual machine placement and load balancing.12. A non-transitory storage medium having stored therein instructionsthat are executable by one or more hardware processors to performoperations comprising: receiving input into a placement engine, theinput including an actual reward of a workload operating in anenvironment included in resources and a state; generating expectedrewards including a first expected reward and a second expected reward;performing the first action on the workload, by an agent associated withthe workload, when the first expected reward is higher than the secondexpected reward; and performing the second action on the workload whenthe second expected reward is higher than the first expected reward. 13.The non-transitory storage medium of claim 12, wherein the environmentcomprises a current virtual machine and wherein the actual rewardcorresponds to a service level agreement metric of the workloadoperating at the current virtual machine.
 14. The non-transitory storagemedium of claim 13, wherein the state incudes a one hot encoding styleof all environments, each of the environments including a virtualmachine.
 15. The non-transitory storage medium of claim 14, wherein theone hot encoding style includes a resource usage per virtual machine, aresource usage per workload, a state of each workload, and a time tocompletion for each workload using floating point values.
 16. Thenon-transitory storage medium of claim 13, wherein the first action isto keep the workload at the current virtual machine and wherein thesecond action is to migrate the workload to a different virtual machine.17. The non-transitory storage medium of claim 12, wherein the placementengine comprises a neural network configured to map the input toexpected rewards, wherein the placement engine is configured to placethe workload in a manner that includes both minimum virtual machineplacement and load balancing.
 18. The non-transitory storage medium ofclaim 12, wherein the placement engine outputs an expected reward forperforming an action relative to each of the virtual machine in theresources.
 19. The non-transitory storage medium of claim 12, furthercomprising adjusting a reward function when an SLA violation isdetected.
 20. The non-transitory storage medium of claim 19, wherein thereward function is:${{f\left( {\Delta,\sigma_{L},\sigma_{R}} \right)} = {{\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}}}{if}\Delta} > 1}},{{otherwise}\frac{- (\Delta)^{2}}{e^{2\sigma_{L}^{2}} - 1}},$wherein Δ is a difference between an SLA response time metric and anactual response time for the workload in the environment, wherein σ_(L)and σ_(R) define, respectively, how fast a left and a right portion ofthe reward function decay.